1 Introduction

1.1 Background

Heart failure (HF) is a chronic cardiac disease that has progressively severe implications on the patient’s quality of life, and it has been a leading cause of death and a significant contributor to the global burden of disease.1 The pathophysiology of HF is complex and involves multiple factors, but a major hallmark of HF is the progressive degeneration of the myocardium that is accompanied by cardiac fibrosis.2 34 During the progression of HF, a process known as cardiac remodelling progressively replaces the myocardium with fibrous tissues, which in turn leads to the gradual stiffening of the heart ventricular wall. Thus, the identification of profibrotic pathways are essential to the development of therapeutic strategies to delay or even to prevent the profibrotic remodelling of the heart.

Previous studies have shown that the straining condition has played a significant role in potently altering the Hippo pathways.5 It was found that mechanical stress on the cell cytoskeleton induces the nuclear entry of the YAP/TAZ complex, which subsequently activates the proliferative pathways.6 7 Although such a mechanism may seem to be beneficial for the restoration of the damaged cardiomyocytes, this pathway has been shown to be more active in cardiac stromal cells, which can subsequently differentiate into myofibroblasts that eventually leads to the profibrotic remodelling of the heart.2 Therefore, directly suppressing the YAP/TAZ complex in cardiac stromal cells may be an effective strategy. Recently, a study by Garoffolo et al. investigated the use of a small molecular inhibitor, verteporfin (VTP), to suppress the YAP/TAZ complex with or without the presence of TGF-β1, a profibrotic cytokine under the maximal strain condition.

1.2 Objectives

The study by Garoffolo et al. aimed to examine the antifibrotic effect of VTP in cardiac stromal cells under profibrotic signals that simulates the straining conditions under heart failure as part of their larger objective.2

Here, we focus on this particular aspect of the research discussed above. In particular, we aim to use RNA-seq data to identify the changes in the transcriptional profile of cardiac stromal cells after the use of VTP. Figure 1. shows the experimental design of the study. Here, we perform differential gene expression analysis comparing the transcriptome of the cells treated with VTP with control under the known profibrotic straining condition. Additionally, we test the efficacy of the drug (VTP) under the presence of TGF-β1, a potent profibrotic cytokine.8 To investigate whether direct inhibition of the YAP/TAZ complex with VTP, we perform threshold over-representation analysis to identify enriched pathways in the differentially expressed genes, which would allow us to identify whether VTP can effectively reverse a profibrotic phenotype.

Figure 1. Schematics of RNAseq experiment design. Samples were cultured under maximal mechanical strains. Six independent cell cultures were used for each treatment condition. cSt-Cs: cardiospheres-derived primitive cardiac stromal cells; VTP: verteporfin; TGF-β1: transforming growth factor beta-1. Figure adapted from Assignment 1 and created with BioRender.com.

1.3 Dataset

The data used in the present analysis is retrieved from the GEO database under the accession number GSE203358. The dataset contains RNA-seq data from 24 samples, which are evenly split into 4 groups: - Control (maximal strain condition) - Verteporfin (VTP) treatment (maximal strain condition) - TGF-β1 treatment (maximal strain condition) - Verteporfin (VTP) treatment with TGF-β1 (maximal strain condition)

1.4 Previous Analysis

Since we have previously performed initial data cleaning and normalization, here we provide a brief summary of the previous analysis. The entire process can be found here.

1.4.1 Data QC and Filtering

The original dataset is composed of 24 samples and 60583genes. After removing genes with low counts, the dataset we use in the present analysis is composed of 14814 genes. The reference genome used is the human GRCh38 genome.

1.4.2 Identifier Mapping and Normalization

The original dataset contains a mix of HUGO gene symbols and GenBank identifiers. We used the biomaRt package to map the rest of the identifiers to HUGO gene symbols.9 The dataset is normalized using the Trimmed Mean of M-values (TMM) method implemented by the edgeR package.10 After applying normalization, the library size for each sample has been effectively adjusted such that the distribution of gene expressions are comparable across samples.

1.4.3 Multidimensional Scaling

We evaluated the similarity between the samples by accessing the clustering of the samples (as well as replicates within each group) using multidimensional scaling (MDS). The MDS plot shown in Figure 2. reveals a moderate clustering of samples based on the treatment conditions, and in particular, the control samples under mechanical strain alone are clustered separately with any other groups. The other treatment groups show less separation along the first dimension, but clustering is still observed. Interestingly, although most of the samples are clustered based on the treatment conditions and that we did not observe any clustering based on the cell culture replicates, we notice that certain samples that do not cluster tightly with the rest in the same treatment group seem to be consistent across the conditions. For example, comparing to other replicates, in cells from culture M89 and M99, different conditions seem to have a smaller effect on the gene expression profile.

Figure 2. Multidimensional scaling plot of cSt-Cs gene expression across samples. All samples are coloured by treatment condition. The first two dimensions are shown. Moderate clustering by treatment and low clustering by replicate for a small number of cultures is observed. Figure adapted from Assignment 1.

2 Preparations

2.1 Package Dependencies

The current R Notebook assumes that it is being run on the BCB420-2023 base image. Additional packages required are noted below.

# edgeR package for manupulating and analyzing RNAseq data
if (!requireNamespace("edgeR", quietly = TRUE)) {
    BiocManager::install("edgeR")
}

# limma package for differential expression analysis
if (!requireNamespace("limma", quietly = TRUE)) {
    BiocManager::install("limma")
}

# ComplexHeatmap package for creating heatmaps
if (!requireNamespace("ComplexHeatmap", quietly = TRUE)) {
    BiocManager::install("ComplexHeatmap")
}

# circlize package for circular visualization
if (!requireNamespace("circlize", quietly = TRUE)) {
    BiocManager::install("circlize")
}

# ggplot2 package for creating plots
if (!requireNamespace("ggplot2", quietly = TRUE)) {
    install.packages("ggplot2")
}

# KableExtra package for creating tables and formatting
if (!requireNamespace("kableExtra", quietly = TRUE)) {
    install.packages("kableExtra")
}

# dplyr package for data manipulation
if (!requireNamespace("dplyr", quietly = TRUE)) {
    install.packages("dplyr")
}

# DT package for creating interactive tables
if (!requireNamespace("DT", quietly = TRUE)) {
    install.packages("DT")
}

# Load the dplyr package
library("dplyr")

# Load the ggplot2 package
library("ggplot2")

# We will consistently use package::function() to avoid confusion However, the
# dplyr package has to be loaded first to allow definitions of various
# operators such as %>%

2.2 Data Import

In the previous analysis, we have saved the processed data output to a file. Hence, we can simply load the data from the file and perform the analysis.

# Load the processed data
counts <- read.csv(file = "./Data/expr_norm_counts.csv", header = TRUE, row.names = 1)

# View the first few rows of the data
head(counts)

Table 1. Processed count data used for the present analysis. The data is composed of 14814 genes and 24 samples, where each gene is identified by a unique HUGO symbol. The first few rows are shown.

Since the header of the data contains the sample information, we will extract the sample information from the header to reconstruct the experimental design.

# Extract the sample information from the header
sample_matrix <- colnames(counts) %>%
    lapply(function(x) {
        unlist(strsplit(x, split = "\\."))
    }) %>%
    as.data.frame()

# Rename the data frame
colnames(sample_matrix) <- colnames(counts)
rownames(sample_matrix) <- c("culture", "treatment")

# Transpose the data frame so that each row represents a unique sample and each
# column represents a unique sample attribute
sample_matrix <- data.frame(t(sample_matrix))

# View the sample information
DT::datatable(sample_matrix[order(sample_matrix$treatment), ], rownames = TRUE, filter = "top",
    options = list(pageLength = 6)) %>%
    # Changing the font size of content
DT::formatStyle(names(sample_matrix), fontSize = "12px") %>%
    # Changing the font size of row names (sample names)
DT::formatStyle(0, fontSize = "12px")

Table 2. Sample information used for the present analysis. A total of 24 samples are included in the analysis, where each sample is identified by a unique culture ID and treatment condition. Cells are retrieved from 6 independent cultures, and each is used for 4 different treatment conditions.

3 Differential Gene Expression Analysis

Understanding the antifibrotic effect of VTP requires a comprehensive analysis of the gene expression profile of the cSt-Cs by identifying the transcriptional changes with or without the treatment of VTP. To this end, we will perform two separate differential gene expression analysis with or without the presence of TGF-β1, in which we can isolate the effect of VTP as the only independent variable.

However, although our data is composed of a total of 4 treatment condition, which allows for the comparison between control (strain alone) and TGF-β1, we will not perform such analysis. It has been well documented that TGF-β1 is a potent profibrotic cytokine that will induce a series of transcriptional activation on profibrotic pathways.11 The authors of the original publication has also verified the profibrotic effect of TGF-β1.8 We have also shown that TGF-β1 significantly upregaltes the expression of profibrotic genes (see associated journal entry for details). Therefore, based on our objective, we believe that it is more valuable to investigate the effect of VTP under maximal strain, both with and without the presence of TGF-β1 separately.

3.1 Global Gene Expression Profiling

Before we perform the differential gene expression analysis, we first want to observe the global expression change at the gene level. Here, we plot a heatmap for all the genes in the dataset to visualize the expression change across all samples under various conditions.

# Creating a DGEList object
counts_dge <- edgeR::DGEList(counts = counts, group = sample_matrix$treatment)

# As discussed in the associated journal entry, the data we use are the counts,
# rather than the counts per million (CPM) values. Therefore, we will need to
# convert the count to CPM values for normalization on library size.
counts_dge <- edgeR::calcNormFactors(counts_dge)
counts_cpm_all <- edgeR::cpm(counts_dge)
# Create a heatmap matrix for all the genes by scaling the CPM values This
# allows the heatmap colour to be on a consistent and continuous scale across
# all the samples
heatmap_matrix <- counts_cpm_all %>%
    t() %>%
    scale()

# Convert it back to the original format
heatmap_matrix <- t(heatmap_matrix)
# Checking the range of the heatmap matrix
hmap_range <- range(heatmap_matrix)
hmap_range
## [1] -3.54320  4.65571

The range of the heatmap matrix is between -3.5432004 and 4.6557105. Since the scaled expression value ranges from negative to positive, we will define a three colour palette for the heatmap, where we will use blue to represent the downregulated genes and red for upregulated genes. In the middle of the scale is white.

Here, heatmap creation and visual representation are implemented using the ComplexHeatmap and circlize packages, respectively.12 13

# Set the 3 color palette based on the scaled heatmap matrix
hmap_col <- circlize::colorRamp2(c(hmap_range[1], 0, hmap_range[2]), c("blue", "white",
    "red"))

Due to the relatively large number of samples, we will also apply annotations to the heatmap to better visualize the clustering of the samples. In particular, we will annotate the heatmap by the treatment condition and culture replicate. This would better allows us to identify whether any of the variables have a significant effect on the clustering of the samples.

# Sort the sample matrix in the same order as the heatmap matrix
sample_matrix_sorted <- sample_matrix[colnames(heatmap_matrix), ]

# Treatment Conditions
treatments <- unique(sample_matrix_sorted$treatment)

# Define 4 colours for the 4 treatment conditions
treatment_col <- hcl.colors(4, palette = "viridis")
names(treatment_col) <- treatments

# Culture Replicates
cultures <- unique(sample_matrix_sorted$culture)

# Define 6 colours for the 6 culture replicates
culture_col <- hcl.colors(6, palette = "set3")
names(culture_col) <- cultures

# Create an annotation dataframe
anno_df <- data.frame(treatment = sample_matrix_sorted$treatment, culture = sample_matrix_sorted$culture)
anno_col <- list(treatment = treatment_col, culture = culture_col)

# Create the heatmap annotation
hmap_anno <- ComplexHeatmap::HeatmapAnnotation(df = anno_df, col = anno_col)

To better visualize the heatmap, we cluster the genes into 4 groups using the k-means clustering algorithm. This allows the genes that have similar expression profiles to be grouped together.14 12 Additionally, since our dataset consists of 4 treatment conditions, it is expected that a clustering of 4 groups will reveal some interesting patterns in the expression profiles.

# Generate the heatmap Set the seed for reproducibility due to he inherent
# randomness of the k-means clustering algorithm
set.seed(91711)
hmap_all <- ComplexHeatmap::Heatmap(heatmap_matrix, show_row_dend = TRUE, show_column_dend = TRUE,
    show_column_names = TRUE, show_row_names = FALSE, col = hmap_col, top_annotation = hmap_anno,
    km = 4)
set.seed(NULL)

# Plot the heatmap
hmap_all

Figure 3. Heatmap of global gene expression across all samples of different treatments. A k-mean clustering of 4 groups was applied to the heatmap for clusterization of gene with similar expression profiles. Samples are hierarchically clustered by their expression profiles. The top bar annotate the samples by treatment conditions and culture in which the cells were extracted.

The 4 groups of genes generated by the k-means clustering algorithm show comparatively distinct expression profiles that define 4 subset (cluster) of samples. However, each of the 4 groups of samples (as indicated by the dendrogram) consists of a mix of samples from different treatment conditions. Based on the observation, we would not expect a very high number of differentially expressed gene between the conditions. However, despite the mix of treatment conditions within each cluster, it is still observable that samples under certain conditions tend to cluster together. For example, the left most cluster consists of samples that are treated with VTP. We can also observe a clustering of a subset of control cells and those that are treated with TGF-β1 alone. Since these two conditions are expected to show a profibrotic transcriptional programme, it is expected that these cells share similar expression patterns.

In addition, the annotation based on the culture replicate also shows a moderate clustering of the samples taken from the same cSt-Cs culture. This means that the culture in which the cells were extract likely contributes to the variance in the gene expression. The observation here using the heatmap is consistent with the distribution of the samples from the MDS plot in Figure 2. Therefore, we will take this into account when we perform build the model for differential expression analysis.

3.2 Model Selection

For differential gene expression analysis, we use the edgeR package, which implements a generailized linear model (GLM) to test for differential expression.10 Based on the objective of our analysis, we will build the model based on the treatment conditions. Additionally, since we have observed a subset of cultures replicates that shows a lower variance within cells from the same culture (Figure 2.) and a moderate degree of clustering in the heatmap (Figure 3.), it is expected that the culture replicate will also contribute to the variance in gene expression. Therefore, we choose the model to include both factors, treatment and culture replicate.

# Extract the factors for the model
treatments <- sample_matrix$treatment
cultures <- sample_matrix$culture

# Create the model
model_treat_cul <- model.matrix(~0 + treatments + cultures)
rownames(model_treat_cul) <- rownames(sample_matrix)

# View the model matrix
DT::datatable(model_treat_cul[order(gsub("M[0-9]{2}\\.", "", rownames(model_treat_cul))),
    ], rownames = TRUE, options = list(pageLength = 6, scrollX = TRUE)) %>%
    # Changing the font size of content
DT::formatStyle(names(model_treat_cul), fontSize = "12px")

Table 3. Model design for differential expression analysis. The model includes both treatment and culture replicate as factors for fitting the generalized linear model.

Furthermore, edgeR implements dispersion estimation with the negative binomial model, which estimates the global variation across the samples in a biological group.15 Correct estimation of dispersion not only allows accurate assessment of the quality of the data, but it is also a prerequisite for the differential expression analysis.15 10 Here we estimate and assess the dispersion using the estimateDisp function.

# Estimate the dispersion
counts_dge <- edgeR::estimateDisp(counts_dge, model_treat_cul)

# Plot the dispersion against the mean
edgeR::plotMeanVar(counts_dge, show.raw.vars = TRUE, show.tagwise.vars = TRUE, NBline = TRUE,
    show.binned.common.disp.vars = TRUE)

Figure 4. Mean-variance relationship of all genes across various treatment conditions. The dispersion is estimated using the negative binomial model. A blue line indicates a trended line based on the negative binomial model, with pooled and estimated gene-wise dispersion shown in gray and light blue, respectively. A black line with a slope of 1 represents the expected variance of a Poisson distribution, which assumes that the variance is equal to the mean.


Figure 4. illustrates the mean-variance relationship of the genes in the dataset across all genes and samples.16 A trended line based on the negative binomial model shows an increasing variance correlating to higher mean expression, suggesting the existence of biological variation. The pooled gene-wise dispersion is similar to that of the estimated dispersion, and both of which align relatively closely with the trend line.

3.3 Differential Expression

Based on the model design and the estimated dispersion above, two differential gene expression analysis are performed separately, comparing the transcriptomic changes involved after the addition of VTP to the cells under maximal strain with and without the presence of TGF-β1. Here, we employ the gene-wise Quasi-likelihood F test with the negative binomial generalized linear model to test for differential expression. We choose the method because it has been suggested to account for gene-specific variations due to biological and technical factors and that it also reflects the uncertainty in dispersion estimation.15 This is recommended for bulk RNA-seq dataset with a complex design (i.e., the two-factor model we employed in this analysis, as shown in Table 3).15 10

3.3.1 Effect of VTP under Maximal Strain

First, we examine the effect of VTP in comparison to control without the presence of TGF-β1. Based on the model, we establish a contrast comparing the treatment of VTP alone to the control. This is implemented in the limma package.17

# Create a contrast for VTP vs. control
contrast_vtp <- limma::makeContrasts("treatmentsVTP - treatmentsCTRL", levels = model_treat_cul)

We then perform the differential expression analysis by fitting the generalized linear model with the factors we designed and performing the Quasi-likelihood F test as described above.

# Fit the model
glm_fit_treat_cul <- edgeR::glmQLFit(counts_dge, model_treat_cul)

# Perform the differential expression analysis
qlf_vtp_ctrl <- edgeR::glmQLFTest(glmfit = glm_fit_treat_cul, contrast = contrast_vtp)
# Extract the results
results_vtp_ctrl <- edgeR::topTags(qlf_vtp_ctrl, sort.by = "PValue", n = nrow(counts_dge))$table

# Save the differential expression results
write.csv(results_vtp_ctrl, file = "./Data/dge_vtp_ctrl.csv", row.names = TRUE)

We will now examine the genes that are differentially expressed.

# Differentially expressed genes based on unadjusted p-value
sum(results_vtp_ctrl$PValue < 0.05)
## [1] 3651

Among the 14814 genes in the dataset, 3651 are considered to be differentially expressed based on the unadjusted p-value with a threshold of 0.05. This is consistent with other RNA-seq studies that have a focus on heart gene expression.18 Additionally, a threshold of 0.05 is also a common choice, since it is a conservative threshold that does not post a high risk of false positives and is not too stringent that may mask other biologically relevant genes.

Since paring RNA-seq expression data involves multiple hypothesis testing, we also adjust the p-values using the Benjamini-Hochberg method. This method assumes that the power to detect differential expression is equally likely among all the tests, which is suitable in the case of RNA-seq data, where the assumption is that most genes are not differentially expressed.19

# Differentially expressed genes based on adjusted p-value
sum(results_vtp_ctrl$FDR < 0.05)
## [1] 1040

Ater adjusting for multiple hypothesis testing, a total of 1040 genes are considered to be truly differentially expressed. Since it is known that VTP is a direct inhibitor of the TAP/TAZ complex, which is involved in multiple pathways such as cell proliferation and migration, we expect that a certain amount of genes should be differentially expressed.2 7

# Retrieve the set of differentially expressed genes
dge_vtp_ctrl <- results_vtp_ctrl[results_vtp_ctrl$FDR < 0.05, ]

3.3.1.1 Up-regulated Genes under VTP Treatment

# Up-regulated genes
vtp_ctrl_up <- dge_vtp_ctrl[dge_vtp_ctrl$logFC > 0, ]

# We add gene symbol as a column so that it is searchable
vtp_ctrl_up$Gene = rownames(vtp_ctrl_up)
vtp_ctrl_up <- vtp_ctrl_up[, c(ncol(vtp_ctrl_up), 1:ncol(vtp_ctrl_up) - 1)]

# Sort by fold change and visualize with a table
vtp_ctrl_up <- vtp_ctrl_up[order(vtp_ctrl_up$logFC, decreasing = TRUE), ]
DT::datatable(vtp_ctrl_up, rownames = FALSE, filter = "top", options = list(pageLength = 5,
    scrollX = TRUE)) %>%
    # Changing the font size of content
DT::formatStyle(colnames(vtp_ctrl_up), fontSize = "12px") %>%
    # Changing the font size of row names (sample names)
DT::formatStyle(0, fontSize = "12px")

Table 4. Up-regulated genes under VTP treatment comparing to control without TGF-β1.

3.3.1.2 Down-regulated Genes under VTP Treatment

# Down-regulated genes
vtp_ctrl_down <- dge_vtp_ctrl[dge_vtp_ctrl$logFC < 0, ]

# We add gene symbol as a column so that it is searchable
vtp_ctrl_down$Gene = rownames(vtp_ctrl_down)
vtp_ctrl_down <- vtp_ctrl_down[, c(ncol(vtp_ctrl_down), 1:ncol(vtp_ctrl_down) - 1)]

# Sort by fold change and visualize with a table
vtp_ctrl_down <- vtp_ctrl_down[order(vtp_ctrl_down$logFC, decreasing = FALSE), ]
DT::datatable(vtp_ctrl_down, rownames = FALSE, filter = "top", options = list(pageLength = 5,
    scrollX = TRUE)) %>%
    # Changing the font size of content
DT::formatStyle(colnames(vtp_ctrl_down), fontSize = "12px") %>%
    # Changing the font size of row names (sample names)
DT::formatStyle(0, fontSize = "12px")

Table 5. Down-regulated genes under VTP treatment comparing to control without TGF-β1.

3.3.1.3 Analysis and Visualization

Interestingly, there we found 605 up-regulated genes, which is more than the number of down-regulated genes, 435. The top upregulated genes are VWA5B2, BCYRN1, UNC13A, SLC14A1, PDE4C, which are not associated with fibrosis or cell proliferation. Rather, the top downregulated gene is COL11A1, with a fold change of 18.11. This gene encodes collagen type XI alpha 1 chain. which directly involves in the formation of collagen fibrils.20 We can see that a total of 0 collagen genes are downregualed, with none of them upregulated. This suggests that VTP reduces the expression of collagen genes, which are the main components of fibrotic tissues. We also found that P4HA1 gene is among the most significantly downregualted genes. The gene enxcodes a key subunit of the prolyl 4-hydroxylase complex, which is involved in a key step of collagen synthesis.21 Among the upregulated genes, we found multiple SLC family genes, which are involved in membrane transport and are likely to contribute to cellular respirations, which are known to be reduced in fibrotic tissues.22

Here we visualize the differentially expressed genes using a Volcano plot, which allows for easy visualization of the genes accoding to their fold change and significance.

# Label genes as up/down-regulated
results_vtp_ctrl <- results_vtp_ctrl %>%
    dplyr::mutate(Expression = dplyr::case_when(logFC > 0 & FDR < 0.05 ~ "Up-regulated",
        logFC < 0 & FDR < 0.05 ~ "Down-regulated", TRUE ~ "Not DE"))

# Extract a sublist of inportant genes to label in the plot Subset
# differentially expressed genes
results_deg <- results_vtp_ctrl[results_vtp_ctrl$FDR < 0.05, ]

# 1. Collagen genes
genes_to_label <- results_deg[grepl("COL", rownames(results_deg)), ]
# 2. Prolyl 4-hydroxylase subunit
genes_to_label <- rbind(genes_to_label, results_deg[grepl("P4HA1", rownames(results_deg)),
    ])
# 3. Top 3 up and down-regulated genes based on fold change
results_sort_by_FC <- results_deg[order(results_deg$logFC), ]
genes_to_label <- rbind(genes_to_label, results_sort_by_FC[1:3, ], results_sort_by_FC[(nrow(results_sort_by_FC) -
    2):nrow(results_sort_by_FC), ])

# Remove any duplicated genes in the subset
genes_to_label <- genes_to_label[!duplicated(rownames(genes_to_label)), ]
# We will plot all genes, with highlighted genes representing the
# differentially expressed genes Color genes based on whether it is
# significantly up/down-regulated Label the genes of interest
ggplot2::ggplot(results_vtp_ctrl, aes(logFC, -log(FDR, 10))) + ggplot2::geom_point(aes(color = Expression),
    size = 2/5) + xlab(expression("log"[2] * "FC")) + ylab(expression("-log"[10] *
    "FDR")) + ggplot2::scale_color_manual(values = c(`Up-regulated` = "firebrick3",
    `Down-regulated` = "dodgerblue3", `Not DE` = "gray50")) + ggplot2::guides(color = ggplot2::guide_legend(override.aes = list(size = 1.5))) +
    ggplot2::geom_label(data = genes_to_label, mapping = aes(logFC, -log(FDR, 10),
        label = rownames(genes_to_label)), size = 2, color = "black", nudge_x = 0.1,
        nudge_y = 0.1)

Figure 5. Volcano plot of differentially expressed genes comparing VTP treatment to control without TGF-β1. Differentially expressed genes are coloured based on direction of change. Genes of interest are labelled.

To better visualize the differentially expressed genes between the two test condition, a heatmap is plotted to support the visualization of gene expression, and particularly the clustering of samples.

# Extract the differentially expressed genes from the heatmap matrix
heatmap_matrix_vtp_ctrl <- heatmap_matrix[rownames(dge_vtp_ctrl), grepl("\\.VTP|\\.CTRL",
    colnames(heatmap_matrix))]

# Checking the range of the heatmap matrix
hmap_range <- range(heatmap_matrix_vtp_ctrl)

# Set the 3 color palette based on the scaled heatmap matrix
hmap_col <- circlize::colorRamp2(c(hmap_range[1], 0, hmap_range[2]), c("blue", "white",
    "red"))

# Redefine 2 colours for the 4 treatment conditions
treatment_col <- hcl.colors(2, palette = "viridis")
names(treatment_col) <- c("VTP", "CTRL")

# Create an annotation dataframe
anno_df <- data.frame(treatment = sample_matrix_sorted[grepl("\\.VTP|\\.CTRL", rownames(sample_matrix_sorted)),
    "treatment"], culture = sample_matrix_sorted[grepl("\\.VTP|\\.CTRL", rownames(sample_matrix_sorted)),
    "culture"])
anno_col <- list(treatment = treatment_col, culture = culture_col)

# Create the heatmap annotation
hmap_anno <- ComplexHeatmap::HeatmapAnnotation(df = anno_df, col = anno_col)

# Plot the heatmap
hmap_vtp_ctrl <- ComplexHeatmap::Heatmap(heatmap_matrix_vtp_ctrl, show_row_dend = TRUE,
    show_column_dend = TRUE, show_column_names = TRUE, show_row_names = FALSE, col = hmap_col,
    top_annotation = hmap_anno)
hmap_vtp_ctrl

Figure 6. Heatmap of differentially expressed genes comparing VTP treatment to control without TGF-β1. Samples are annotated based on treatment conditions and culture replicates.

3.3.2 Effect of VTP under Maximal Strain with Profibrotic TGF-β1

Additionally, we want to investigate whether VTP can still negates the fibrotic effect under the potent profibrotic stimulus of TGF-β1. Therefore, we also establish a contrast comparing the treatment of VTP to control where both conditions are also subjected to the same TGF-β1 treatment.

# Create a contrast for VTP vs. control with TGF-β1
contrast_vtp_tgf <- limma::makeContrasts("treatmentsTGFb_VTP - treatmentsTGFb", levels = model_treat_cul)

# Perform differential expression analysis
results_vtp_tgf <- edgeR::glmQLFTest(glmfit = glm_fit_treat_cul, contrast = contrast_vtp_tgf)
# Extract the results
results_vtp_tgf <- edgeR::topTags(results_vtp_tgf, sort.by = "PValue", n = nrow(counts_dge))$table
# Save the differential expression results
write.csv(results_vtp_ctrl, file = "./Data/dge_vtp_tgf.csv", row.names = TRUE)

There are 14 differentially expressed genes comparing VTP treatment to control in the presence of TGF-β1. Similar to the previous analysis, we used an threshold of 0.05 for the false discovery rate (FDR), as corrected by the Benjamini-Hochberg method, to determine whether a gene is considered differentially expressed. Among these differentially expressed genes, there are 3 genes that are up-regulated and 11 genes that are down-regulated, as shown below.

3.3.2.1 Up-regulated Genes under VTP Treatment

# Retrieve the set of differentially expressed genes
dge_vtp_tgf <- results_vtp_tgf[results_vtp_tgf$FDR < 0.05, ]

# Up-regulated genes
vtp_tgf_up <- dge_vtp_tgf[dge_vtp_tgf$logFC > 0, ]

# We add gene symbol as a column so that it is searchable
vtp_tgf_up$Gene <- rownames(vtp_tgf_up)
vtp_tgf_up <- vtp_tgf_up[, c(ncol(vtp_tgf_up), 1:(ncol(vtp_tgf_up) - 1))]

# Sort the genes by fold change and visualize using a table
vtp_tgf_up <- vtp_tgf_up[order(vtp_tgf_up$logFC, decreasing = TRUE), ]
DT::datatable(vtp_tgf_up, rownames = FALSE, filter = "top", options = list(pageLength = 5,
    scrollX = TRUE)) %>%
    DT::formatStyle(colnames(vtp_ctrl_up), fontSize = "12px") %>%
    DT::formatStyle(0, fontSize = "12px")

Table 6. Up-regulated genes under VTP treatment in the presence of TGF-β1.

3.3.2.2 Down-regulated Genes under VTP Treatment

# Down-regulated genes
vtp_tgf_down <- dge_vtp_tgf[dge_vtp_tgf$logFC < 0, ]

# We add gene symbol as a column so that it is searchable
vtp_tgf_down$Gene <- rownames(vtp_tgf_down)
vtp_tgf_down <- vtp_tgf_down[, c(ncol(vtp_tgf_down), 1:(ncol(vtp_tgf_down) - 1))]

# Sort the genes by fold change and visualize using a table
vtp_tgf_down <- vtp_tgf_down[order(vtp_tgf_down$logFC), ]
DT::datatable(vtp_tgf_down, rownames = FALSE, filter = "top", options = list(pageLength = 5,
    scrollX = TRUE)) %>%
    DT::formatStyle(colnames(vtp_ctrl_down), fontSize = "12px") %>%
    DT::formatStyle(0, fontSize = "12px")

Table 7. Down-regulated genes under VTP treatment in the presence of TGF-β1.

3.3.2.3 Analysis and Visualization

Since there are only 14 differentially expressed genes based on the corrected FDR threshold, the Volcano plot is will likely show mostly noise. However, we will only annotate the top 3 genes for both the up-regulated and down-regulated genes based on the fold change.

# Label genes as up-regulated or down-regulated
results_vtp_tgf <- results_vtp_tgf %>%
    dplyr::mutate(Expression = dplyr::case_when(logFC > 0 & FDR < 0.05 ~ "Up-regulated",
        logFC < 0 & FDR < 0.05 ~ "Down-regulated", TRUE ~ "Not DE"))

# Label differentially expressed genes
genes_to_label <- results_vtp_tgf[results_vtp_tgf$Expression != "Not DE", ]

# Plot the volcano plot Color genes based on whether it is significantly
# up/down-regulated
ggplot2::ggplot(results_vtp_tgf, aes(logFC, -log(FDR, 10))) + ggplot2::geom_point(aes(color = Expression),
    size = 2/5) + xlab(expression("log"[2] * "FC")) + ylab(expression("-log"[10] *
    "FDR")) + ggplot2::scale_color_manual(values = c(`Up-regulated` = "firebrick3",
    `Down-regulated` = "dodgerblue3", `Not DE` = "gray50")) + ggplot2::guides(color = ggplot2::guide_legend(override.aes = list(size = 1.5))) +
    ggplot2::geom_label(data = genes_to_label, mapping = aes(logFC, -log(FDR, 10),
        label = rownames(genes_to_label)), size = 2, color = "black", nudge_x = 0.1,
        nudge_y = 0.1)

Figure 7. Volcano plot of differentially expressed genes under VTP treatment in the presence of TGF-β1. Red and blue dots represent up-regulated and down-regulated genes, respectively. All differentially expressed genes are labeled.

# Extract the differentially expressed genes from the heatmap matrix
heatmap_matrix_vtp_tgf <- heatmap_matrix[rownames(dge_vtp_tgf), grepl("\\.TGFb_VTP|\\.TGFb",
    colnames(heatmap_matrix))]

# Check the range of the heatmap matrix
hmap_range <- range(heatmap_matrix_vtp_tgf)

# Set the 3 color palette based on the scaled heatmap matrix
hmap_col <- circlize::colorRamp2(c(hmap_range[1], 0, hmap_range[2]), c("blue", "white",
    "red"))

# Redefine 2 colours for the 4 treatment conditions
treatment_col <- hcl.colors(2, palette = "viridis")
names(treatment_col) <- c("TGFb_VTP", "TGFb")

# Create an annotation dataframe
anno_df <- data.frame(treatment = sample_matrix_sorted[grepl("\\.TGFb_VTP|\\.TGFb",
    rownames(sample_matrix_sorted)), "treatment"], culture = sample_matrix_sorted[grepl("\\.TGFb_VTP|\\.TGFb",
    rownames(sample_matrix_sorted)), "culture"])
anno_col <- list(treatment = treatment_col, culture = culture_col)

# Create the heatmap annotation
hmap_anno <- ComplexHeatmap::HeatmapAnnotation(df = anno_df, col = anno_col)

# Plot the heatmap
hmap_vtp_tgf <- ComplexHeatmap::Heatmap(heatmap_matrix_vtp_tgf, show_row_dend = TRUE,
    show_column_dend = TRUE, show_column_names = TRUE, show_row_names = TRUE, col = hmap_col,
    top_annotation = hmap_anno)
hmap_vtp_tgf

Figure 8. Heatmap of differentially expressed genes under VTP treatment in the presence of TGF-β1.

4 Threshold Over-representation Analysis

4.1 Effect of VTP under Maximal Strain

4.1.1 All Differentially Expressed Genes

4.1.2 Up-regulated Genes in VTP

4.1.3 Down-regulated Genes in VTP

4.2 Effect of VTP under Maximal Strain with Profibrotic TGF-β1

4.2.1 All Differentially Expressed Genes

4.2.2 Up-regulated Genes in VTP

4.2.3 Down-regulated Genes in VTP

5 Interpretation and Analysis

5.1 Differential Gene Expression Analysis

  • How many genes were significantly differentially expressed? What thresholds did you use and why?

  • Multiple hypothesis testing - correct your p-values using a multiple hypothesis correction method. Which method did you use? And Why? How many genes passed correction?

  • Show the amount of differentially expressed genes using an MA Plot or a Volcano plot. Highlight genes of interest.

  • Visualize your top hits using a heatmap. Do you conditions cluster together? Explain why or why not.

5.2 Threshold Over-representation Analysis

  • Which method did you choose and why?

  • What annotation data did you use and why? What version of the annotation are you using?

  • How many genesets were returned with what thresholds?

  • Run the analysis using the up-regulated set of genes, and the down-regulated set of genes separately. How do these results compare to using the whole list (i.e all differentially expressed genes together vs. the up-regulated and down regulated differentially expressed genes separately)?

5.3 Result Interpretation

  • Do the over-representation results support conclusions or mechanism discussed in the original paper?

  • Can you find evidence, i.e. publications, to support some of the results that you see. How does this evidence support your results.

References

1.
Heidenreich PA, Bozkurt B, Aguilar D, et al. 2022 AHA/ACC/HFSA guideline for the management of heart failure: A report of the american college of cardiology/american heart association joint committee on clinical practice guidelines. Journal of the American College of Cardiology. 2022;79(17):e263-e421.
2.
Garoffolo G, Casaburo M, Amadeo F, et al. Reduction of cardiac fibrosis by interference with YAP-dependent transactivation. Circulation Research. 2022;131(3):239-257.
3.
Porter KE, Turner NA. Cardiac fibroblasts: At the heart of myocardial remodeling. Pharmacology & therapeutics. 2009;123(2):255-278.
4.
Tallquist MD, Molkentin JD. Redefining the identity of cardiac fibroblasts. Nature Reviews Cardiology. 2017;14(8):484-491.
5.
Brusatin G, Panciera T, Gandin A, Citron A, Piccolo S. Biomaterials and engineered microenvironments to control YAP/TAZ-dependent cell behaviour. Nature materials. 2018;17(12):1063-1075.
6.
Elosegui-Artola A, Andreu I, Beedle AE, et al. Force triggers YAP nuclear entry by regulating transport across nuclear pores. Cell. 2017;171(6):1397-1410.
7.
Panciera T, Azzolin L, Cordenonsi M, Piccolo S. Mechanobiology of YAP and TAZ in physiology and disease. Nature reviews Molecular cell biology. 2017;18(12):758-770.
8.
Saadat S, Noureddini M, Mahjoubin-Tehran M, et al. Pivotal role of TGF-\(\beta\)/smad signaling in cardiac fibrosis: Non-coding RNAs as effectual players. Frontiers in cardiovascular medicine. 2021;7:588347.
9.
Durinck S, Spellman PT, Birney E, Huber W. Mapping identifiers for the integration of genomic datasets with the r/bioconductor package biomaRt. Nature protocols. 2009;4(8):1184-1191.
10.
Robinson MD, McCarthy DJ, Smyth GK. edgeR: A bioconductor package for differential expression analysis of digital gene expression data. bioinformatics. 2010;26(1):139-140.
11.
Walker EJ, Heydet D, Veldre T, Ghildyal R. Transcriptomic changes during TGF-\(\beta\)-mediated differentiation of airway fibroblasts to myofibroblasts. Scientific reports. 2019;9(1):20377.
12.
Gu Z, Eils R, Schlesner M. Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics. 2016;32(18):2847-2849.
13.
Gu Z, Gu L, Eils R, Schlesner M, Brors B. Circlize implements and enhances circular visualization in r. Bioinformatics. 2014;30(19):2811-2812.
14.
Sharko J, Grinstein GG, Marx KA, et al. Heat map visualizations allow comparison of multiple clustering results and evaluation of dataset quality: Application to microarray data. In: 2007 11th International Conference Information Visualization (IV’07). IEEE; 2007:521-526.
15.
Lun AT, Chen Y, Smyth GK. It’s DE-licious: A recipe for differential expression analyses of RNA-seq experiments using quasi-likelihood methods in edgeR. Statistical genomics: methods and protocols. Published online 2016:391-416.
16.
McCarthy DJ, Chen Y, Smyth GK. Differential expression analysis of multifactor RNA-seq experiments with respect to biological variation. Nucleic acids research. 2012;40(10):4288-4297.
17.
Ritchie ME, Phipson B, Wu D, et al. Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic acids research. 2015;43(7):e47-e47.
18.
Ahmed A, Syed JN, Chi L, et al. KDM8 epigenetically controls cardiac metabolism to prevent initiation of dilated cardiomyopathy. Nature Cardiovascular Research. Published online 2023:1-18.
19.
Korthauer K, Kimes PK, Duvallet C, et al. A practical guide to methods controlling false discoveries in computational biology. Genome biology. 2019;20(1):1-21.
20.
Moore-Morris T, Guimarães-Camboa N, Banerjee I, et al. Resident fibroblast lineages mediate pressure overload–induced cardiac fibrosis. The Journal of clinical investigation. 2014;124(7):2921-2934.
21.
Nwogu JI, Geenen D, Bean M, Brenner MC, Huang X, Buttrick PM. Inhibition of collagen synthesis with prolyl 4-hydroxylase inhibitor improves left ventricular function and alters the pattern of left ventricular dilatation after myocardial infarction. Circulation. 2001;104(18):2216-2221.
22.
Pizzagalli MD, Bensimon A, Superti-Furga G. A guide to plasma membrane solute carrier proteins. The FEBS journal. 2021;288(9):2784-2835.